#open-weight LLMs13/08/2025
Reinforcement Learning Unlocks Open-Weight LLMs for Long-Horizon Software Engineering
Nebius AI and Humanoid adapted DAPO-based reinforcement learning to train an open-weight Qwen2.5 agent for long-horizon software engineering, reaching 39% Pass@1 on SWE-bench Verified without teacher supervision.